{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Serving a Model as an HTTPS Endpoint\n",
    "\n",
    "We need to understand the application and business context to choose between real-time and batch predictions. Are we trying to optimize for latency or throughput? Does the application require our models to scale automatically throughout the day to handle cyclic traffic requirements? Do we plan to compare models in production through A/B tests?\n",
    "\n",
    "If our application requires low latency, then we should deploy the model as a real-time API to provide super-fast predictions on single prediction requests over HTTPS. We can deploy, scale, and compare our model prediction servers with SageMaker Endpoints. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"img/sagemaker-architecture.png\" width=\"80%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "import pandas as pd\n",
    "\n",
    "sess   = sagemaker.Session()\n",
    "bucket = sess.default_bucket()\n",
    "role = sagemaker.get_execution_role()\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "sm = boto3.Session().client(service_name='sagemaker', region_name=region)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "training_job_name='gold-training-master'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gold-training-master\n"
     ]
    }
   ],
   "source": [
    "print(training_job_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Copy the Model to the Notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Completed 972 Bytes/972 Bytes (11.5 KiB/s) with 1 file(s) remaining\r",
      "download: s3://sagemaker-us-east-1-835319576252/tensorflow-training-2020-06-04-05-39-51-084/output/model.tar.gz to ./model.tar.gz\r\n"
     ]
    }
   ],
   "source": [
    "!aws s3 cp s3://$bucket/tensorflow-training-2020-06-04-05-39-51-084/output/model.tar.gz ./model.tar.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n"
     ]
    }
   ],
   "source": [
    "!tar -xvzf ./model.tar.gz "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 261816\r\n",
      "drwxrwxr-x 20 ec2-user ec2-user      4096 Aug 17 04:46 \u001b[0m\u001b[01;34m.\u001b[0m/\r\n",
      "drwxrwxr-x 16 ec2-user ec2-user      4096 Aug 12 00:12 \u001b[01;34m..\u001b[0m/\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user      2371 Aug 14 15:33 00_Overview.ipynb\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user      8719 Jul 25 21:29 01_Invoke_SageMaker_Autopilot_Model_From_Athena.ipynb\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user     16646 Aug 12 00:07 02_Deploy_Reviews_BERT_PyTorch_REST_Endpoint.ipynb\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user      9472 Aug 16 20:36 03_Deploy_Reviews_BERT_TensorFlow_REST_Endpoint.ipynb\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user     34818 Aug 12 00:08 04_Perform_AB_Test_Reviews_BERT_TensorFlow_REST_Endpoints.ipynb\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user     11328 Aug 10 06:21 05_Deploy_Reviews_BERT_TensorFlow_Batch_Predictions_TSV.ipynb\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user     30675 Aug 17 04:45 99_Add_InferenceDotPy_To_Model.ipynb\r\n",
      "drwxrwxr-x  2 ec2-user ec2-user      4096 Jul 29 02:17 \u001b[01;34mbatch_prediction_output\u001b[0m/\r\n",
      "drwxr-xr-x  2 ec2-user ec2-user      4096 Aug 10 06:46 \u001b[01;34mcode.orig\u001b[0m/\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user      1302 Aug 11 23:51 config.json\r\n",
      "drwxrwxr-x  3 ec2-user ec2-user      4096 Jul 25 21:29 \u001b[01;34mdata\u001b[0m/\r\n",
      "drwxrwxr-x  5 ec2-user ec2-user      4096 Jul 25 21:29 \u001b[01;34mdata-tfrecord\u001b[0m/\r\n",
      "drwxrwxr-x  2 ec2-user ec2-user      4096 Jul 25 21:29 \u001b[01;34mdocker\u001b[0m/\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user       322 Aug 12 00:09 .gitignore\r\n",
      "drwxrwxr-x  2 ec2-user ec2-user      4096 Aug 13 01:44 \u001b[01;34mimg\u001b[0m/\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user        53 Aug 11 23:51 index_to_name.json\r\n",
      "drwxrwxr-x  2 ec2-user ec2-user      4096 Aug 17 03:12 \u001b[01;34m.ipynb_checkpoints\u001b[0m/\r\n",
      "drwxrwxr-x  2 ec2-user ec2-user      4096 Aug 11 23:51 \u001b[01;34mMAR-INF\u001b[0m/\r\n",
      "drwxr-xr-x  2 ec2-user ec2-user      4096 Aug 10 04:15 \u001b[01;34mmetrics\u001b[0m/\r\n",
      "drwxrwxr-x  4 ec2-user ec2-user      4096 Aug 17 03:05 \u001b[01;34mmodel\u001b[0m/\r\n",
      "drwxrwxr-x  3 ec2-user ec2-user      4096 Aug 11 02:46 \u001b[01;34mmodels\u001b[0m/\r\n",
      "drwxrwxr-x  2 ec2-user ec2-user      4096 Aug 11 23:51 \u001b[01;34mmodel_store\u001b[0m/\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user       972 Jun  4 18:13 \u001b[01;31mmodel.tar.gz\u001b[0m\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user 267852933 Aug 11 23:51 pytorch_model.bin\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user       171 Aug 11 23:51 setup_config.json\r\n",
      "drwxrwxr-x  2 ec2-user ec2-user      4096 Jul 29 01:27 \u001b[01;34msrc_batch_tsv\u001b[0m/\r\n",
      "drwxrwxr-x  5 ec2-user ec2-user      4096 Jul 25 21:29 \u001b[01;34msrc_torchserve\u001b[0m/\r\n",
      "drwxr-xr-x  2 ec2-user ec2-user      4096 Aug 10 04:11 \u001b[01;34mtensorboard\u001b[0m/\r\n",
      "drwxr-xr-x  3 ec2-user ec2-user      4096 Jun  4 05:45 \u001b[01;34mtensorflow\u001b[0m/\r\n",
      "-rw-rw-r--  1 ec2-user ec2-user      7790 Aug 11 23:51 Transformer_handler_generalized.py\r\n",
      "drwxr-xr-x  3 ec2-user ec2-user      4096 Jun  4 05:45 \u001b[01;34mtransformers\u001b[0m/\r\n",
      "drwxrwxr-x 13 ec2-user ec2-user      4096 Aug 17 04:44 \u001b[01;34mwip\u001b[0m/\r\n"
     ]
    }
   ],
   "source": [
    "ls -al ./"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 8\r\n",
      "drwxr-xr-x 2 ec2-user ec2-user 4096 Jun  4 05:45 .\r\n",
      "drwxr-xr-x 3 ec2-user ec2-user 4096 Jun  4 05:45 ..\r\n"
     ]
    }
   ],
   "source": [
    "!ls -al model/tensorflow/saved_model/0\n",
    "#!mv ./tensorflow/saved_model/ saved_model/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls ./saved_model/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-08-17 04:44:05.379402: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib:/usr/local/cuda-10.0/efa/lib:/opt/amazon/efa/lib:/opt/amazon/efa/lib64:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:\n",
      "2020-08-17 04:44:05.379484: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib:/usr/local/cuda-10.0/efa/lib:/opt/amazon/efa/lib:/opt/amazon/efa/lib64:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:\n",
      "2020-08-17 04:44:05.379496: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n",
      "Traceback (most recent call last):\n",
      "  File \"/home/ec2-user/anaconda3/envs/python3/bin/saved_model_cli\", line 8, in <module>\n",
      "    sys.exit(main())\n",
      "  File \"/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/tools/saved_model_cli.py\", line 990, in main\n",
      "    args.func(args)\n",
      "  File \"/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/tools/saved_model_cli.py\", line 691, in show\n",
      "    _show_all(args.dir)\n",
      "  File \"/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/tools/saved_model_cli.py\", line 272, in _show_all\n",
      "    tag_sets = saved_model_utils.get_saved_model_tag_sets(saved_model_dir)\n",
      "  File \"/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/tools/saved_model_utils.py\", line 88, in get_saved_model_tag_sets\n",
      "    saved_model = read_saved_model(saved_model_dir)\n",
      "  File \"/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/tools/saved_model_utils.py\", line 55, in read_saved_model\n",
      "    raise IOError(\"SavedModel file does not exist at: %s\" % saved_model_dir)\n",
      "OSError: SavedModel file does not exist at: ./saved_model/0/\n"
     ]
    }
   ],
   "source": [
    "!saved_model_cli show --all --dir ./saved_model/0/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Show `inference.py`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mjson\u001b[39;49;00m\r\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36msubprocess\u001b[39;49;00m\r\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36msys\u001b[39;49;00m\r\n",
      "subprocess.check_call([sys.executable, \u001b[33m'\u001b[39;49;00m\u001b[33m-m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mpip\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33minstall\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mtensorflow==2.1.0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m])\r\n",
      "subprocess.check_call([sys.executable, \u001b[33m'\u001b[39;49;00m\u001b[33m-m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mpip\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33minstall\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mtransformers==2.8.0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m])\r\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mtensorflow\u001b[39;49;00m \u001b[34mas\u001b[39;49;00m \u001b[04m\u001b[36mtf\u001b[39;49;00m\r\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mtransformers\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m DistilBertTokenizer\r\n",
      "\r\n",
      "classes=[\u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m, \u001b[34m3\u001b[39;49;00m, \u001b[34m4\u001b[39;49;00m, \u001b[34m5\u001b[39;49;00m]\r\n",
      "\r\n",
      "max_seq_length=\u001b[34m128\u001b[39;49;00m\r\n",
      "\r\n",
      "tokenizer = DistilBertTokenizer.from_pretrained(\u001b[33m'\u001b[39;49;00m\u001b[33mdistilbert-base-uncased\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\r\n",
      "\r\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32minput_handler\u001b[39;49;00m(data, context):\r\n",
      "    transformed_instances = []\r\n",
      "\r\n",
      "    \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mDATA \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m.format(data))\r\n",
      "\r\n",
      "    \u001b[34mfor\u001b[39;49;00m instance \u001b[35min\u001b[39;49;00m data:\r\n",
      "        data_str = instance.decode(\u001b[33m'\u001b[39;49;00m\u001b[33mutf-8\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\r\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mDATA_STR \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m.format(data_str))\r\n",
      "        \r\n",
      "        tokens = tokenizer.tokenize(data_str)\r\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mTOKENS \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m.format(tokens))\r\n",
      "\r\n",
      "        encode_plus_tokens = tokenizer.encode_plus(data_str,\r\n",
      "                                                   pad_to_max_length=\u001b[34mTrue\u001b[39;49;00m,\r\n",
      "                                                   max_length=max_seq_length)\r\n",
      "\r\n",
      "        \u001b[37m# Convert the text-based tokens to ids from the pre-trained BERT vocabulary\u001b[39;49;00m\r\n",
      "        input_ids = encode_plus_tokens[\u001b[33m'\u001b[39;49;00m\u001b[33minput_ids\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m]\r\n",
      "        \u001b[37m# Specifies which tokens BERT should pay attention to (0 or 1)\u001b[39;49;00m\r\n",
      "        input_mask = encode_plus_tokens[\u001b[33m'\u001b[39;49;00m\u001b[33mattention_mask\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m]\r\n",
      "        \u001b[37m# Segment Ids are always 0 for single-sequence tasks (or 1 if two-sequence tasks)\u001b[39;49;00m\r\n",
      "        segment_ids = [\u001b[34m0\u001b[39;49;00m] * max_seq_length\r\n",
      "    \r\n",
      "        transformed_instance = { \r\n",
      "                                 \u001b[33m\"\u001b[39;49;00m\u001b[33minput_ids\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: input_ids, \r\n",
      "                                 \u001b[33m\"\u001b[39;49;00m\u001b[33minput_mask\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: input_mask, \r\n",
      "                                 \u001b[33m\"\u001b[39;49;00m\u001b[33msegment_ids\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: segment_ids\r\n",
      "                               }\r\n",
      "    \r\n",
      "        transformed_instances.append(transformed_instance)\r\n",
      "\r\n",
      "    transformed_data = {\u001b[33m\"\u001b[39;49;00m\u001b[33minstances\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: transformed_instances}\r\n",
      "    \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mtransformed_data \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m.format(transformed_data))\r\n",
      "\r\n",
      "    \u001b[34mreturn\u001b[39;49;00m json.dumps(transformed_data)\r\n",
      "\r\n",
      "\r\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32moutput_handler\u001b[39;49;00m(response, context):\r\n",
      "    response_json = response.json()\r\n",
      "\r\n",
      "    log_probabilities = response_json[\u001b[33m\"\u001b[39;49;00m\u001b[33mpredictions\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]\r\n",
      "\r\n",
      "    predicted_classes = []\r\n",
      "\r\n",
      "    \u001b[34mfor\u001b[39;49;00m log_probability \u001b[35min\u001b[39;49;00m log_probabilities:\r\n",
      "        softmax = tf.nn.softmax(log_probability)    \r\n",
      "        predicted_class_idx = tf.argmax(softmax, axis=-\u001b[34m1\u001b[39;49;00m, output_type=tf.int32)\r\n",
      "        predicted_class = classes[predicted_class_idx]\r\n",
      "        predicted_classes.append(predicted_class)\r\n",
      "    \r\n",
      "    predicted_classes_json = json.dumps(predicted_classes)    \r\n",
      "    \u001b[36mprint\u001b[39;49;00m(predicted_classes_json)\r\n",
      "\r\n",
      "    response_content_type = context.accept_header\r\n",
      "\r\n",
      "    \u001b[34mreturn\u001b[39;49;00m predicted_classes_json, response_content_type\r\n"
     ]
    }
   ],
   "source": [
    "!pygmentize ./code/inference.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gzip: ./model.tar.gz: No such file or directory\r\n"
     ]
    }
   ],
   "source": [
    "!gunzip ./model.tar.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model.tar\r\n"
     ]
    }
   ],
   "source": [
    "!ls model.tar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "code/inference.py\r\n"
     ]
    }
   ],
   "source": [
    "!tar -uvf model.tar code/inference.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "transformers/\r\n",
      "transformers/fine-tuned/\r\n",
      "tensorflow/\r\n",
      "tensorflow/saved_model/\r\n",
      "tensorflow/saved_model/0/\r\n",
      "code/inference.py\r\n"
     ]
    }
   ],
   "source": [
    "!tar -xzvf model.tar.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 4260\r\n",
      "drwxr-xr-x 4 ec2-user ec2-user    4096 Jun  4 05:45 .\r\n",
      "drwxr-xr-x 3 ec2-user ec2-user    4096 Jun  4 05:45 ..\r\n",
      "drwxr-xr-x 2 ec2-user ec2-user    4096 Aug 10 04:15 assets\r\n",
      "-rw-r--r-- 1 ec2-user ec2-user 4342992 Aug 10 04:15 saved_model.pb\r\n",
      "drwxr-xr-x 2 ec2-user ec2-user    4096 Aug 10 04:15 variables\r\n"
     ]
    }
   ],
   "source": [
    "!ls -al ./tensorflow/saved_model/0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gzip model.tar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model.tar.gz\r\n"
     ]
    }
   ],
   "source": [
    "!ls model.tar.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Completed 2.0 KiB/2.0 KiB (27.2 KiB/s) with 1 file(s) remaining\r",
      "upload: ./model.tar.gz to s3://sagemaker-us-east-1-835319576252/gold-training-master/output/model.tar.gz\r\n"
     ]
    }
   ],
   "source": [
    "!aws s3 cp ./model.tar.gz s3://$bucket/$training_job_name/output/model.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploy the Model\n",
    "This will create a default `EndpointConfig` with a single model.  \n",
    "\n",
    "The next notebook will demonstrate how to perform more advanced `EndpointConfig` strategies to support canary rollouts and A/B testing.\n",
    "\n",
    "_Note:  If not using a US-based region, you may need to adapt the container image to your current region using the following table:_\n",
    "\n",
    "https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gm-tf-1597634989\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "timestamp = int(time.time())\n",
    "\n",
    "gold_training_master_model_name = '{}-{}-{}'.format('gm', 'tf', timestamp)\n",
    "\n",
    "print(tensorflow_model_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Parameter image will be renamed to image_uri in SageMaker Python SDK v2.\n"
     ]
    }
   ],
   "source": [
    "from sagemaker.tensorflow.serving import Model\n",
    "\n",
    "gold_training_master = Model(name=gold_training_master_model_name,\n",
    "                             model_data='s3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name),\n",
    "                             role=role,                \n",
    "                             framework_version='2.1.0')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gold-training-master-tf-1597634989\n"
     ]
    }
   ],
   "source": [
    "gold_training_master_endpoint_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)\n",
    "\n",
    "print(gold_training_master_endpoint_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.\n"
     ]
    }
   ],
   "source": [
    "tensorflow_model = tensorflow_model.deploy(endpoint_name=gold_training_master_endpoint_name,\n",
    "                                           initial_instance_count=1, # Should use >=2 for high(er) availability \n",
    "                                           instance_type='ml.c5.9xlarge', # requires enough disk space for tensorflow, transformers, and bert downloads\n",
    "                                           wait=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints/gold-training-master-tf-1597634989\">SageMaker REST Endpoint</a></b>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}\">SageMaker REST Endpoint</a></b>'.format(region, gold_training_master_endpoint_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "waiter = sm.get_waiter('endpoint_in_service')\n",
    "waiter.wait(EndpointName=gold_training_master_endpoint_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# _Wait Until the ^^ Endpoint ^^ is Deployed_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Simulate a Prediction from an Application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from sagemaker.tensorflow.serving import Predictor\n",
    "\n",
    "predictor = Predictor(endpoint_name=gold_training_master_endpoint_name,\n",
    "                      sagemaker_session=sess,\n",
    "                      content_type='application/json',\n",
    "                      model_name='saved_model',\n",
    "                      model_version=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Predict the `star_rating` with Ad Hoc `review_body` Samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reviews = [\"This is great!\"]\n",
    "\n",
    "predicted_classes = predictor.predict(reviews)\n",
    "\n",
    "for predicted_class, review in zip(predicted_classes, reviews):\n",
    "    print('[Predicted Star Rating: {}]'.format(predicted_class), review)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Predict the `star_rating` with `review_body` Samples from our TSV's"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "df_reviews = pd.read_csv('./data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz', \n",
    "                         delimiter='\\t', \n",
    "                         quoting=csv.QUOTE_NONE,\n",
    "                         compression='gzip')\n",
    "df_sample_reviews = df_reviews[['review_body', 'star_rating']].sample(n=100)\n",
    "df_sample_reviews = df_sample_reviews.reset_index()\n",
    "df_sample_reviews.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "def predict(review_body):\n",
    "    return predictor.predict([review_body])[0]\n",
    "\n",
    "df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)\n",
    "df_sample_reviews"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Save for Next Notebook(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#%store tensorflow_model_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#%store tensorflow_endpoint_name "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#%store"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Delete Endpoint\n",
    "To save cost, we should delete the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sm.delete_endpoint(\n",
    "#      EndpointName=gold_training_master_endpoint_name\n",
    "# )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "// %%javascript\n",
    "// Jupyter.notebook.save_checkpoint();\n",
    "// Jupyter.notebook.session.delete();"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
